Triage 27 dependabot alerts to zero + harden vLLM 0.21.0 build docs#3
Merged
Conversation
Audit before: 27 dependabot alerts (1 critical, 8 high, 17 moderate, 1 low) / 14 npm audit findings across five root-cause packages. After: 0 vulnerabilities reported by either tool. Resolution path, in order of leverage: - Drop @langchain/community. It was never imported. Removing it kills the entire @langchain/community → ibm-cloud-sdk-core → axios chain (11 axios CVEs, follow-redirects header leak, and a stagehand-pulled ws instance), along with a transitive langsmith copy. - Replace @xenova/transformers (2.17.2, abandoned at this version) with @huggingface/transformers (4.2.0, the maintained successor). xenova pinned onnxruntime-web@1.14.0 which pinned onnx-proto@4.0.4 which pinned protobufjs@6.11.4 — vulnerable to a CVSS-9.8 RCE plus eight other advisories with no upstream fix coming. The HF package ships onnxruntime-web@1.26.x and modern protobuf handling, and its pipeline() / FeatureExtractionPipeline surface is signature-identical to xenova. Migration is a single import line; runtime contract for LocalTransformersEmbeddings is unchanged. - Bump direct uuid to ^13.0.2 to clear the buffer-bounds advisory on v3/v5/v6 — we use uuidv5 for both chunk IDs and Qdrant point IDs, so this is on the hot path. - Pin transitive uuid, langsmith, and ws via npm overrides so the fixes hold even when @langchain/core or openai resolves to older ranges of those packages. Verification deferred to runtime: The HF transformers swap changes the ONNX execution path. Embedding vectors should be byte-equivalent against the same Xenova/all-MiniLM-L6-v2 model but this hasn't been smoke-tested end-to-end. If output shifts, the existing Qdrant collection's points become unreachable to new queries — drop and re-ingest after merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Currency update on top of the npm triage. vLLM 0.19.0 is two minor versions behind upstream (0.21.0 is current) and the documented build procedure was written around 0.19.0 specifically. v0.20.0 and v0.21.0 introduced three breaking deltas that affect this guide: - C++20 build requirement (v0.21.0). gcc-10+ required. Ubuntu 22.04 default gcc-11 and 24.04 default gcc-13 both satisfy this — flagged in the prerequisites and the troubleshooting table for older distros. - PyTorch 2.11 minimum (v0.20.0). The existing nightly index resolves to 2.11+ automatically; just a wording change to note the floor. - HuggingFace transformers v5 required (v0.20.0). Picked up transitively by the build; no install-step change. The RDNA3-specific workarounds (BUILD_FA=0, --enforce-eager, ROCM_ATTN backend, no FP8, no hipBLASLt, hipBLAS+TunableOp) are all unchanged — no release-notes mention of gfx1100 CUDA-graph stability fixes, so --enforce-eager stays on for now. Also bumps the ROCm minor recommendation from "7.2.x" to "7.2.2", which v0.21.0 references explicitly (#41386). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four findings from validating the v0.21.0 doc bump against an actual ROCm 7.2.1 / gfx1100 host today, each of which would have saved an hour or so if the doc had covered it: - Flip the default torch nightly index from rocm6.4 → rocm7.2. The rocm7.2 index is now published; the older rocm6.4 wheels carry a HIP ABI that doesn't match /opt/rocm 7.2 and produce `undefined symbol: _ZN3c10*` at runtime when vLLM's compiled extensions try to load against the mismatched libtorch. - Add CMAKE_ARGS="-DHIP_FOUND=TRUE" to the build env. The rocm7.2 nightly torch's bundled LoadHIP.cmake detects HIP correctly and prints every ROCm library version but no longer exports the HIP_FOUND global variable that vLLM's CMakeLists.txt:151 checks. Without the override the build dies with "Can't find CUDA or HIP installation" despite HIP being clearly present in the configure log. - Add a `python -c "import vllm._C, vllm._rocm_C"` smoke immediately after the build. Necessary because opt-125m's code path uses Python fallbacks and will load + generate even with stale/broken .so files — making the existing smoke test a false-positive on a real ABI problem. The Llama-architecture model is the first thing that exercises kernels registered by _C. - Restructure §8 into two stages: opt-125m for build sanity, then Llama-3.1-8B for kernel sanity. Update the "expected log lines" prefix to the new v0.21.0 TURBOQUANT-rejection wording, document the harmless first-request Triton JIT-compile warnings introduced by the new jit_monitor, and flag the "Cannot use ROCm custom paged attention kernel, falling back to Triton" line as expected on RDNA3 (it's a fallback *within* ROCM_ATTN, not a fallback *from* it). Troubleshooting table gains rows for: the HIP_FOUND CMake error, the torch-ABI undefined-symbol case, the opt-125m false-pass scenario, the JIT-monitor warnings, and the paged-attention Triton fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three related currency / hygiene updates in one PR. Each commit is independently reviewable.
Commit 1 — Triage all 27 dependabot alerts to zero
Drives the open advisory count from 27 dependabot alerts (1 critical, 8 high, 17 moderate, 1 low) down to zero;
npm auditfrom 14 findings to 0. Five vulnerable packages collapse to a small set of fixes:@langchain/communitysrc/. Removing kills theibm-cloud-sdk-core → axios → follow-redirectschain (11 axios CVEs incl. prototype-pollution highs) plus transitivewsand a redundantlangsmithcopy.@xenova/transformers(2.17.2, abandoned) with@huggingface/transformers(4.2.0)protobufjs@6.11.4— the CVSS-9.8 Arbitrary Code Execution advisory plus 8 others. HF shipsonnxruntime-web@1.26.x. One-line import swap —pipeline()andFeatureExtractionPipelineare signature-identical.uuidto^13.0.2uuidv5(hot path: chunk IDs and Qdrant point IDs).overridesfor transitiveuuid,langsmith,ws@langchain/core/openaiwould otherwise pull older ranges.Commit 2 — Bump vLLM build docs to v0.21.0
v0.19.0 → v0.21.0 introduces three breaking deltas worth flagging in vllm-setup.md:
transformers ≥ 5(v0.20.0). Picked up transitively, no manual step.The RDNA3 workarounds —
BUILD_FA=0,--enforce-eager, ROCM_ATTN backend, no FP8 / hipBLASLt — are unchanged.Commit 3 — Harden the doc with empirical lessons from validating v0.21.0 on real hardware
Four gaps in commit 2 surfaced when actually rebuilding against ROCm 7.2.1 / gfx1100, all now documented:
rocm6.4→rocm7.2. Therocm7.2index is live; olderrocm6.4wheels embed a HIP ABI that mismatches/opt/rocm7.2 and produceundefined symbol: _ZN3c10*at runtime when vLLM's compiled extensions load against the wrong libtorch.CMAKE_ARGS="-DHIP_FOUND=TRUE"to the build env. The rocm7.2 nightly torch's bundledLoadHIP.cmakedetects HIP correctly but no longer exports theHIP_FOUNDglobal that vLLM'sCMakeLists.txt:151checks. Without the override the build dies with "Can't find CUDA or HIP installation" despite HIP being clearly present in the configure log.python -c "import vllm._C, vllm._rocm_C"smoke immediately after the build. opt-125m alone is a false-positive sanity check — its code path uses Python fallbacks and will load + generate even with stale/broken.sofiles. The Llama-architecture model is the first thing that exercises kernels registered by_C, and fails atSiluAndMul.__init__withAttributeError: '_OpNamespace' '_C' object has no attribute 'silu_and_mul'when the extensions didn't load.jit_monitorfeature) and the "Cannot use ROCm custom paged attention kernel, falling back to Triton" line (fallback within ROCM_ATTN, not from it — normal on RDNA3).Troubleshooting table gains five new rows: HIP_FOUND CMake error, torch-ABI
undefined symbol, opt-125m false-pass,jit_monitorwarnings, paged-attention Triton fallback.Test plan
npm audit→ 0 vulnerabilitiesnpx tsc --noEmitcleanvllm-setup.mdmake vllmbrings up Llama-3.1-8B on:8000;curl /v1/modelsreturns the model idnpm startruns the full pipeline: ingests for two tenants, returns grounded answers with parsed citations, cross-tenant probe returns "Not supported by available context."points_countstays stable on re-ingest (4 tenant-isolated points after cleanup of 2 pre-migration legacy points)Operational notes
Post-merge migration. If you have an existing
saas_docsQdrant collection from before this PR, you'll find pre-PR points sitting alongside the new tenant-isolated ones. They're correctly excluded by the tenant filter, but they're noise. Clean with:🤖 Generated with Claude Code